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Q ■ Abstract 

«!-> ■ 
/^ ' The aim of the paper is to establish a convergence theorem for multi-dimensional stochastic 

approximation when the "innovations" satisfy some "light" averaging properties in the presence of 

a pathwise Lyapunov function. These averaging assumptions allow us to unify apparently remote 

frameworks where the innovations are simulated (possibly deterministic like in Quasi-Monte Carlo 

r^ . ' simulation) or exogenous (like market data) with ergodic properties. We propose several fields of 

p I . applications and illustrate our results on five examples mainly motivated by Finance. 
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lO ■ 1 Introduction 

rn 

t^ ' The aim of this paper is to establish a convergence theorem for multi-dimensional recursive stochastic 
approximation in a non-standard framework (compared to the huge literature on this field, see [6], 
[13], [19], [4], etc): we will significantly relax our assumption on the innovation process by only asking 
for some natural "light" ergodic or simply averaging assumptions, compensated by a reinforcement of 
the mean reversion assumption since we will require the existence of a pathwise Lyapunov function. 
We will show that this approach unifies seemingly remote settings: those where the innovations are 
C^ I simulated or even deterministic (quasi-Monte Carlo simulation) and those where the innovations are 
exogenous data (like market data). Especially in the latest case it may be not realistic to make a 
priori too stringent assumptions on the dynainics of such data process, like mixing or Markov. On 
the other hand, the pathwise Lyapunov assumption is definitely an intrinsic limitation to the kind of 
problem we can deal with, compared to the procedures extensively investigated in [6] or more recently 
in [11] where innovations are Markovian and share mixing properties. 

However, we provide several examples, mainly inspired by Finance, to illustrate the fact that the 
field of application of our framework is rather wide and can solve efficiently various kinds of problems, 
some of them having already been considered in the literature. 
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Let us be more specific: this paper presents convergence results for R'^-valued stochastic approxi- 
mation procedures of Robbins- Monro type (see [36] for the original paper), namely 

en+l=en--in+lH{en,Yn), n > 0, Oo^M.^ (1.1) 

where i:f : M x M"? — t- M is a Borel function, (7n)n>i a sequence of positive steps and the "innovation" 
sequence {Yn)n>o satisfies some "elementary" averaging assumptions (^o is assumed to be deterministic 
in this introduction for convenience simplicity). In fact, we will consider a slightly more general setting 
which includes an extra noisy term 

On+l =0n- In+l {HiOn, ^n) + AM„+i) , n > 0, (1.2) 

where (AM„)„>i is a sequence of M'^-valued J>i-adapted martingale increments for a filtration Tn- 

To establish the a.s. convergence of the sequence {6n)n>o toward its "target" 0* (to be specified later 
on), the idea is to make the assumption that the innovation sequence (l^)n>o satisfies an averaging 
property in a "linear" setting: typically that, for a wide enough class V of integrable functions (with 
respect to a probability measure u), 

V/GV, -V/(n) -^ [ fdu (1.3) 

at a common rate of convergence to be specified further on. If V D C;,(R^,M), this implies 

^ n — 1 

— > Oyi, =^ V as. 

n ^ — ' ra— s>oo 

k=0 

by a separability argument (^=^ denotes the weak convergence of probability measures). Such a 
sequence {Yn)n>o is often called "stable" in the literature, at least when it is a Markov chain. If 
V = L^ii^), the sequence (Yn)n>o may be called in short "ergodic" although no true ergodic framework 
comes in the game at this stage. The target of our recursive procedure (1.2) is then, as expected, a 
zero, if any, of the (asymptotic) mean function of the algorithm defined as 

hie):= I H{9,y)v{dy). 

JM.1 

The key assumption is the existence of pathwise Lyapunov function with respect to the innovation i.e. 
a function L satisfying 

{VL{e)\H{9,y)-H{9*,y))>0 

for every 9 and y. This assumption may look very stringent but in fact, it embodies standard framework 
of Stochastic Approximation with Markov representation of the form (1.1) when the (l^)n>o is i.i.d. 
since, under appropriate integrability assumptions, it can be rewritten as follows in canonical form 

On+l =9n- 7n+l (^(^n, l^i) + AM„+i) , n > 0, ^o G M^ 

where H{9,-) = h{9) and AM„+i = H{9n,Yn) — h{9n), n > 0. Then (AAf„)„>i is a sequence of 
(t(1o) • • • ) l^-i)-martingale increments (under appropriate integrability assumptions). Finally H{9, •) = 
h{9) does not depend on y so that the above notion of pathwise Lyapunov function reduces to the 
standard one. The above canonical form has been extensively investigated (and extended) in many 
textbooks on Stochastic Approximation (see [6], [13], [18], [19]). 



Our main theorem (Theorem 2.1) let us retrieve almost entirely the classical results about L^- 
boundedness and a.s. convergence of this procedure under standard Lyapunov assumption. Many 
extensions have been developed when {Yn)n>o or even (0n,^n)n>o have a Markovian dynamics (see 
the seminal textbook [6] and more recent contributions like [11] and several references therein). The 
main constraint induced by such an approach is that the existence as well as assumptions on the 
solution of the Poisson equation related to this chain are needed. 

As a first field of applications, we are interested in quasi-random numbers. The original idea 
of replacing by uniformly distributed sequences (with low discrepancy) i.i.d. innovations in recursive 
stochastic approximation procedures goes back to the early 1990's in [24], leading to "Quasi-Stochastic 
Approximation" {QSA^ referring to QMC for Quasi- Monte Carlo). The framework in [24] was purely 
one-dimensional whereas many numerical tests h proved the efficiency of QSA in a multi-dimensional 
setting. The aim is to establish a convergence theorem in this higher dimensional setting under 
natural regularity assumptions {i.e. based on Lipschitz regularity rather than finite variation in the 
Hardy & Krause or in the measure sense, often encountered in the QMC world). As concerns the low 
discrepancy sequences, our framework is probably close to the most general one to get pointwise a.s. 
convergence of stochastic approximation. 

As a second setting, we consider the case when iYn)n>o is a functional of a-mixing process satisfying 
a priori no Markov assumption. These processes are stationary and dependent, so more realistic to 
model inputs made of real data. To describe the class of functions V we need to prove the convergence 
of the series of covariance coefficients of the innovations. To this end we use some results in [34] and 
the covariance inequality for a-mixing process (see [12]). Next with the probabilistic version of the 
Gal-Koksma theorem (see [16] and [1, 2, 3]) we prove that this class is large enough {L'^'^ {v) C V, 
5 > Q). Finally we examine the case of homogeneous Markov chain with (unique) invariant distribution 
u. Several convergence results of stochastic approximation have been proved in this setting in [6], but 
they all rely on the existence (and some regularity properties) of a solution to Poisson equation. To 
describe V we add an assumption on the transition of the chain which allows us to prove that this 
class does not depend on the initial value of the chain. 

Finally we propose several examples of applications illustrated with numerical experiment. They 
can be parted in two classes: the first one devoted to simulated innovations (i.e. Numerical Probability 
methods) and the second one deals with the applications involving real data. Primarily we present a 
simple case of calibration: the search, for a derivative product in a financial model, of an implicit model 
parameter fitting with its market value. We implement the algorithm with both an i.i.d. sequence 
and a quasi-Monte Carlo sequence to compare their respective rates of convergence. The second 
example is devoted to the recursive computation of risk measures commonly considered in energy 
portfolio management: the Value-at-Risk and the Conditional- Value-at-Risk. We design a stochastic 
gradient and a companion procedure to compute risk measures (like in [5, 15]) and we show that they 
can be successfully implemented in a QSA framework. In the third example, we solve numerically 
a "toy" long term investment problem leading to a static potential minimization derived from an 
ergodic control problem (see [30]). rThe potential is related to the invariant measure of a diffusion so 
that the innovation relies on (inhomogeneous Markov) Euler schemes with decreasing step introduced 
in [20] (see also [27]). These three stochastic approximation procedures relie on simulated data. The 
fourth example is the so-called two-armed bandit introduced in learning automata and mathematical 
psychology in the 1950's (see [31]). Its a.s. behaviour in the i.i.d. setting has been extensively 
investigated in [23] and [21] and then partially extended in [38] to a more general ergodic framework. 
We show how the starting point of this extension appears as a consequence our multiplicative case 
(Theorem 2.2). The last example describes a model of asset allocation across liquidity pools fully 
developed in [26] involving exogenous real market data, a priori sharing no Markov property but on 
which an averaging assumption seems natural (at least within a medium laps of time). 



The paper is organized as follows: in Section 2 are stated and proved the two main results: Theorem 
2.1 and its counterpart Theorem 2.2, for multiplicative noise. Section 3 is devoted to quasi-Stochastic 
Approximation, i.e. the case where the innovation process is an uniformly distributed deterministic 
sequence over [0, 1]"^. Section 4 is devoted to applications to random innovations, namely additive 
noise, mixing process (functionals of a- mixing process), ergodic homogeneous Markov chain. Section 
5 presents five examples of applications including numerical illustrations, mostly in connection with 
Finance: implicit correlation search, recursive computation of VaR and CVaR, long term investment 
evaluation, two-armed bandit algorithm and optimal allocation problem (more developed in [26]). 

Notations (• | •) denotes the canonical Euclidean inner product and |-| its related norm on M'^. The 

almost sure convergence will be denoted by ^-4 and =^ will denote the weak convergence of probability 
measures on {W,Bor{W^)). Aa„ = a^ — Un-i for every sequence {an)n- 

2 Algorithm design and main theoretical result 

In this paper, we consider the following general framework for recursive stochastic algorithms of the 
following form 

9n+l =9n- 7n+l (i?(^n, Yn) + AM„+i) , n > 0, (2.4) 

where {Yn)n>o is an M'^-valued sequence of J>i-adapted random variables and (AM„)„>i is a sequence 
of J-',i-adapted martingale increment, all defined on a same filtered probability space (Q, T, {Tn)n>o, IP)- 
Moreover ^o S L^^i^^J^o,^) and ^o is independent of {Yn, AMn-\-i)n>o- The positive step sequence 
(7n)n>i is non- increasing and H is a Borel function from M^ x W^ to M'^. 

In the following, we adopt a kind of compromise by assuming that {Yn)n>o is a process satisfying 
some averaging properties and that the function H{9* , •) belongs to a class of functions (to be specified 
further on) for which a rate of convergence {a.s. and in L^) holds in (1.3). Moreover we need to reinforce 
the Lyapunov condition on the pseudo-mean function H which limits, at least theoretically, the range 
of application of the method. 

2.1 Framework and assumptions 

Let {Yn)n>o be an M^'-valued random variables sequence. We will say that the sequence (l^)n>o 
satisfies a instability assumption or equivalently is v-averaging if 

^ n— 1 

ndu^ya.s. -Y.5y,(^) ^ 1^. (2.5) 

fc=0 

We will see that the stochastic approximation procedure defined by (2.4) is a recursive zero search 
of the (asymptotic) mean function 

h{e):= [ H{9,y)v{dy). (2.6) 

Let p G [l,oo) and let (en)n>o be a sequence of nonnegative numbers such that 

En — ;► and liminf ne„ = 0. (2-7) 

71— >oo n 

We denote by Ve„,p the class of functions which convergence rate in (1.3) in both a.s. and in L*'(P) 
sense is e~^, namely 

n-l 



'^£n,p 



"" fe=0 -^ J 



2.2 Main result 

Now we are in a position to state an a. s. -convergence theorem "a la" Robbins-Siegmund. 

Theorem 2.1. (a) Boundedness. Let hiR'^ ^R'^ satisfying (2.6), H -.W^xW ^W^ a Borel function 
and let {Yn)n>o be a instable sequence (i.e. satisfying (2.5)). Assume there exists a continuously 
differentiable function L : M'' — )• M_|_ satisfying 

VL is Lipschitz continuous and |VL| < C(l + L) (2-9) 

and that the pseudo-mean function H satisfies the pathwise Lyapunov assumption 

ye G M'^Vir }, Vy G Ri, (VL(e) | H{e, y) - H{e*,y)) > 0. (2.10) 

Let p G [l,oo) and let (en)n>i be a sequence satisfying (2.7). Assume that 

Hi9*,-)€Ve„,p. (2.11) 

Moreover, assume that H satisfies the following (quasi-) linear growth assumption 

V^GM^VyGM", \H{9,y)\<CH(t>{y){^ + Lm-2 (2.12) 

and that the martingale increments sequence (AM„)„>i satisfies for every n > 0, 

E (|AM„+i|2^^ I Tn) < CMHYnf^'l^^il + Lien))'""^^ if P>1, 



(i+L{e„))^ 

where Cm is a positive real constant and sup„>o ||0(^n)||2v J- ^ +oo- 

— p-i 

Let 7 = (7n)n>i be a nonnegative non-increasing sequence of "admissible" gain parameters satisfying 

E7n = +00, ne„7„ — ^ 0, and V] ne„max (7^, |A7„+i|) < +oo. (2.14) 

n>l n>l 

Then, the recursive procedure defined by (2.4) satisfies {L{6n))n>o is L^ -bounded, L{9n) — > L^o < 
+00 a.s., A6n — > a.s. and 



n— >oo 



^{VL{9n)\ H{9n,Yn) - H{e*,Y^)) < +^. 



n>l 

(b) A.S. convergence toward 9*. Furthermore, if {9*} is a connected component of {L = L(0*)} and 
the pseudo-mean function H satisfies the strict pathwise Lyapunov assumption 

yS > 0, V^ G MMr}, Vy G M'?, {VL{9) I H{9,y) - H{9\y)) > x,(y)^5(e) (2.15) 

where y{Xi) > 0, ^5 is l.s.c. and positive on M \{0*} and {^^^^{^s = 0} = {0*}, then 

9n ^ 9*. 
Remark. The conditions on the step sequence 7 = (7n)n>i and (en)n>i are satisfied for example by 
e„ = n-^, /3g(0,1], and 7„ = ;^, 1-/3 <a<l, c> 0. (2.16) 



Proof. First ste p: We introduce the function 



A(0) := Vl + L{9) 

as a Lyapunov function instead of L{9) like in the classical case. It follows from the fundamental 
formula of calculus that there exists Cn+i G {&n,0n+i) such that 

A(e„+i) = A(0„) + (VA(0„) I A0„+i) + (VA(e„+i) - VA{0n) \ A9n+i) 
< A{dn) + {VA{0n) I A0„+i) + |VA(e„+i) - VA(0„)||A^„+i|. 

Lemma 2.1. The new Lyapunov function A satisfies the two following properties 
(i) VA is hounded (so that A is Lipschitz). 

(ii) ye, 6' G M^ \vA{e') - vA{e)\ < c^^. 

Proof of Lemma 2.1. (i) VA = /y+j ^^ bounded by (2.9), consequently A is Lipschitz. 

(ii) Let e,e' eR'^, 



|VATO-VA«)')1 < IV^^)-V^^')I , IV^C-) 



< 



1 



^l + L(0')-\/l + ^(^) 



C7 



2x/l + M^) 



[A]Lip \o-e'\ 






A(0) 



D 



Thus, applying the above lemma to 9 = 9n and 6' = ^n+i, and noting that |^n+i — On\ < |A0„4_i| yields 



A(0„+i) < A(a„) - 7„+i (VA(0„) I HiOn, y„)) - 7n+i (VA(0„) | AM„+i) + C^- 



|A0„+i| 



\/T+L{9n) 

A(^„) - 7n+i (VA(^„) I H{9n,Yn) - H{e*,Yn)) - 7n+i (VA(^„) | H{9*,Yn)) 

-7n+l \VA(fy„j I /\Mn+l) + Cl^n+l- 



We have for every n > 0, 



Vi+X(C) 



|7n+l (VA(a„) I //(r,y„))| < CAjn+MYn) G L^i 



since VA is bounded. Besides E [(VA(^„) | AM^+i) | J^n] = 0, n > 0, since AM„ is a true martingale 
increment and VA is bounded. Furthermore, owing to (2.12) and (2.13) 



E 



\Hi9n,Yn)+AMn+l\ 



•'n 



< C^\Yn)Ai9n) 



Vl + L{9n) 

(where conditional expectation is extended to positive random variables). Consequently, 

E [A{9n+l) I Tn] < A{9n) (l + C^7n+l</'(^n)') " 7n+l (VA(a„) | H{9n,Yn) - H{9\Yn)) 
--ln+l{VA{9n)\H{9*,Yn)). 



(2.17) 



We set Vr, :- 



Br, 



n > 1 where 



n-l 



An := A(e„) + ^ 7fe+i (VA(^fe) | i7(^fc, n) - H{0*,Yk)) , 5„ := J] (l + C7^7|0(yfc_i)2) . 



fc=0 



fc=i 



Using the mean-reverting assumption (2.15) impHes that (j4„)„>o is a nonnegative process and B^ is 
J-"„_i-adapted, n > 1. Elementary computations first show that 

E [An+l I Tn] < An^ " 7n+l (VA(0„) | /^-(r , y„)) 

which finally yields 

Vn > 0, E [K+i I -7"n] < K - ATy„+i, (2.18) 

where VT, := Efc=o7fc+i (VA(efc) I ^(e*,^)) with 7„ := ^, n > 0. 

Second step : Now our aim is to prove that the sequence (W„)n>o is L^-bounded and a.s. converges. 
To this end we set 5* := YllZl H{e*,Yk), then it follows 

n— 1 n— 1 

W^n = Y.^k+1 (VA(0fe) I A5^+i> = 7„ (VA(0„„i) I 5:) - J^ (5^ | 7fc+iVA(0fc) - 7fcVA(0,,_i)) . 



A;=0 



fc=l 



First, since VA is bounded, note that 



7n|VA(^„_i)| \Sl\ < ||VA||^ne„7„^ < ||VA||^ n£„7„^ 



which a.s. goes to as n goes to infinity since nenjn - 
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bounded. Moreover 

E[7jVA(0„_i)||5:|]<ne„7j|VA| 



S 
by (2.14) and ( — — ) remains a.s. 



ner. 



•n./ n>l 






ner. 



which converges to in L^ because nenjn — > and H{9*, •) G Ve„,p. On the other hand, 



n-l 



n— >oo 
n-l 



n-l 



Y, {S*k I 7fe+iVA(^fc) - 7fcVA(^fe_i)) = Y,{S*k\ VA(0fc)) Ajk+i + ^ 7fc (^fe I VA(efc) - VA{9k-i)) 



k=l 



k=l 



k=l 



As VA 



VL 



is bounded by construction, we have 



fc=i 



A;=l 



2y/TTL 

n n n 

Y, |A7fc+i {SI I VA(0fc))| < ^ |A7fc+i| \Sl\ ||VA||^ < ||VA||^ J^ ke^ | A%+i 
/fc=i 

Now, using that j^ < y/a, a > 

|A7fc+i| < |A7fc+i| +7yfc 
Hence 



St 



ksk 



c'liUMy,'' 



B, 



fc+i 



k)- C'^ll+i(t>{Yk? 

< |A7fc+i|+7fc ^ < |A7fc+i| + Ci7fc7fc+l</-(yfc.)• 



^|A7fc+i(5||VA(0,))|<||VA||^^A:efc|A7fe+i| -A + C^ ^ tefc7fc7fc+i</'(>^A 



fc=i 



vfc=i 






fc = l 



5* 



A;et 



By Holder's Inequality 



E (t^{Yk) 



kev 



< mYk)\\^ 



p-i 






kei 



As I - — I is bounded, 7 is admissible and sup ||(/)(yfc) II j_ < +00, then the series X]fc=i ^7fc ("S^ I VA(^fc)) 



kei 



kJ n>0 



fc>0 



p-i 



is absolutely converging in L^(P). 

n 

We study now the series ^7fc {SI \ VA{9k) - VA{9k-i)). We have 



k=l 



\VAiek)-VAi9k-i)\<C'L- 



\Ae, 



^ ^, |F(0fe_i,n_i)i + |AMfci 



VT+W. 



We are interested in the L^-convergence of the series 



fc-iJ 






+ i^(^fe-l) 



For the first sum, as — ' < CH(t>{yk-i)i we then come to Ylk=i ^Hll^ [|5'^| |';^(^A:-i)|] and 

-y/l + L{9k~i) 
by Holder's inequality we obtain 

E [|5^| |<A(n-i)|] < IIS^IIp ||<A(n-i)||^ < +00 
because 115*^11 = O {ksk) by (2.8) and sup ||(^(Yn.)|| p < +00. Furthermore, as > kskjj^ < +00 by 

(z.l4j, then the series /.'^k I'-'fcl — 1^^^=^^^ converges m L . 
jt=i V 1 + L{9k-i) 

For the second sum, we derive from Holder's inequality (with p and z^) and (2.13) that 



E 



I c* 



lAMfel 



Vl + L(0fc_i) 



< 115, 



k Up 



I AM. 



Vl + ^(^fc-i 



p 
p-i 



<CA/||5|,||p||</.(n-l)||^V2<+00. 

^ P— 1 



I AM; 



This yields that /^Tfc |5'^| — converges in L^ too. Finally we then obtain that Wn 



Woo and sup„>o ll^nlli < +oo. Thus we have that 



a.s. 

n—^oo 



(K + Wn)' <W~ <\Wn\e L^(P) since sup || W„||i < +oo. 



n>0 



As Vo = A{9o) < C{l + \9o\) G L\ it follows by induction from (2.18) that, for every n > 0, E K < +oo. 
Hence Sn '■= Vn + Wn, n > 0, is a true supermartingale with an L^-bounded negative part. We then 
deduce that 



Jn 



Snn G ^ ■ 



Now Wn —^ Woo implies Vn —^ Voo < +oo a.s. 

n— >oo 



a.s. 

n—^ca 



Third step: Now we show that the product Bn converges a.s. to derive that An converges a.s.. 



In fact /^7n0 (^-i) < +c« ci.s., 



n>l 



since supE [0^(y„)] < +00 and ^n>i^n < +°° by combining (2.7) and (2.14), which in turn imphes 

n>0 

that Bn ——>■ Boo < +00. As a consequence An ^-4 A^o < +00. Therefore using the mean reverting 

71— >oo n— ^00 

property (2.15) of H with respect to VA, we classically derive that 

Y,7n{yMdn-i)\H{6n-i,Yn-i)-H{e*,Yn-i))<+oo a.s. (2.19) 

n>l 



Consequently A(0„) ^-4 Aqo < +00 a.s. 



a.s. 

n— >oo 



As hm L{6) = +00, hm A{9) = +00, then the sequence {6n)n>o is a.s.-bounded and 

|6»|^+oo |6»|->+oo ~ 



a.s. 



L{On) —^ Loo < +00 a.s. 

n— >oo 

Now let us show that A0n — > 0. In fact, | A6l„+i|^ < Cj^^^ (|-H'(6'„, Yn)\^ + | AM„+i|^) , so that 



E 



A9n+ir I J'n < C^l+i4>{Ynf{l + ^6^)) 



and {L{6n))n>o being a.s. bounded, 



J]e[|A0„+i|2| j-„ 



n>0 



< +00. a.s. 



which classically implies that Yln>o l^^n+i| < +c« a.s. 

Fourth ste p: To prove the convergence of 9n toward 0* , we use Assumptions (2.15) and (2.19) to 
deduce that 

X]7nX,(^n-l)^5(^n-l)<+00 a.s. (2.20) 

n>l 

n n n—1 

Now, ^7fc+iX,(n) = ^7fc+iA5^ = 7n+i5^-X]^7fc+25^ 

fc=0 A:=0 fc=l 

where 5n = X]fc=o XsO^k) and we set 5q = and ASq = 0. 

By Assumption (2.5), > I'iXs) > as n — )• oo. Let uq be the smallest integer such that 

n 

Vn > no, — > eo = „ > 0. 

n 2 

Then, a standard discrete integration by part yields 

n ^;,(; n-1 ^x 

Vn > no, Yl 7k+iXs{Yk) = n^n+i-^ - C'no + T^ H-^lk+2)^ a.s., 

k=nQ k=no 



where C„q = 7no+i'S'^o-i" Therefore, using that the sequence (— A7„)„>i is nonnegative, 

71—1 71—1 



XI Ik+iXsO^k) > njn+ieo -Cno+ ^ A;(-A7fc+2)eo = eo(^njn+i + ^ A;(-A7fc+2)j - Cn 

k=no 

n-1 

eo(7n+l + ?^07no+l + X^ Ik+l) " ^^ 



A;=no k=no k=no 

n-1 



by a reverse discrete integration by parts. Finally 

n n—1 

y^ -jk+iXsJYk) > eo(7n+i + y^ -yk+ij-Cno-^oo as n->oo 

fc=no fc=no+l 

since ^^ 7^ = +00. We have then shown that 



n>l 



^7k+iXs0^k) = +00 a.s. 
fc>0 

Combining this fact with (2.20) classically implies that 

liminf^5(6'„) = 0. 

n 

Let 0OO be the set of limiting points of the sequence {On)n>o- ©oo is a compact connected set since 
(^n)n>o is bounded and AO^ — > 0. Moreover {^5 = 0} is closed because ^5 < and l.s.c. and 0oo 

is closed too. So 0oo H {^5 = 0} is a family of nonempty compact sets which decreases as (5 \ since 
it is bounded. As a consequence, 

fl (0OO n {^5 = 0}) / 0. 

<5>0 
The other assumption on $5 implies 

n (600 n {^s = 0}) c f|{^5 = 0} = {r}, 

5>0 <5>0 

SO that in fact it is reduced to 6*. Hence 6* is a limiting point of {0n)n>o which implies that L{9n) con- 
verges towards L{9*). By the assumption on the Lyapunov function L, {9*} is a connected component 
of {L = L{9*)} and as 0oo is connected, ©oo = {0*}- Therefore 

9n ——>■ 9* as n — )• 00. D 

Back to the i.i.d. innovation setting. \> Theorem 2.1 contains the standard martingale ap- 
proach "a la" Robbins-Siegmund in the i.i.d. setting. Indeed, we consider the recursive procedure 

On+l=9n-7n+lK{9n,Yn+l), n > 0, 

where {Yn)n>i is i-i-d. with distribution i^ and 9o is independent of {Yn)n>i (ah defined on {Q,,A,F)). 
We set Tn = (t(,9o,Yi, . . . ,y„), n > 0, p = 2, 

H{e,y) = h{9) with hi9)= f K{9,y)u{dy) and AM^+i = K{9n,Yn+i)-E[K{9n,Yn+i)\ Tn] . 
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Assume that 

via cz Tipf^ II v(a v.w 



V^GM^ \\K{Q,Y^)\\^<Ck{\^L(Q)Y^- 



Then Assumption (2.12) is satisfied by h and (2.13) holds (with </> = 1). Furthermore, by combining 
(2.7) and (2.14), we retrieve the step assumption in the standard Robbins-Monro Theorem, namely 

^7„ = +cx) and ^^ 7^ < +00. 

n>l n>l 

[> Another (naive) way to apply Theorem 2.1 in this i.i.d. setting is to focus, under the above 
assumption on the averaging property so that: then H = K and AM„ = 0. We still consider the 
above procedure but we assume furthermore the existence of a pathwise Lyapunov function. By 
noticing that [K{0* ,Yn))n>i is i.i.d. and in L^, it follows from the quadratic law of large numbers (at 

rate n~2 ) and the Law of Iterated Logarithm at rate 0{en) with e„ = w °g^g" that K{9*, •) G Ve„,2- 
As a consequence the condition (2.14) is clearly more restrictive than the above regular one, however 
any step of the form 7^ = ;^, c>0, |<q;<1 satisfies (2.14). 

2.3 The case of multiplicative noise 

If we assume that the function H in (2.4) is of the following form 

V^GM^VyGM^ H{e,y) = xivMO) + H{9*,y), (2.21) 



where x is a Borel function such that z^(x) = 1, X ^ ^en,p ^'^d sup„>o ||x(^n)|l2v J- < +c»i H{6*, •) E 

— p-i 

Ve„,p and sup„>o \\H{0* , i^)||2v j_ < +00, h is Lipschitz bounded with h{6*) = 0, then we replace the 

— p-i 

growth assumption (2.12) on H by one on the mean function h, i.e. 



VeGM'^,VyGM^ \h{0)\ < C7/,Vl + L(^) (2.22) 

and the pathwise mean-reverting assumption (2.15) is the classical 

veeM'^Vir}, {vL\h){e)>o. (2.23) 

Theorem 2.2. The recursive procedure (2.4) with the function H defined by (2.21) and the previous 
assumptions on x <ind (2.22)-(2.23) on h satisfies 

n— ^oo 

Proof. First ste p: This setting cannot be reduced to the general setting. We use the same notations 
as in the proof of Theorem 2.1. With the new form of the function H, we obtain 

A(0n+i) < A(0„) - 7„+i (VA(0„) I x{yn)h{en)) - 7n+i (VA(e„) I H{e*,Yn)) 

-7n+i (VA(6^„) I AM„+i) + Cl7„+i ., ^,„ . • 

yl + L(Pn) 

By the same arguments as before we get 

E[A(0„+l) \Tn]<HOn) {l + C'Lll+i(l}{Ynf)-ln+l{y^{en)\x{Yn)h{en))-ln+l{y^{en)\H{e\Yn)). 
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We set Vn := ^, where An := A(^n) + Efe=d 7fc+i (VA | h) {Ok) and i?„ := JILi (l + ^^710(^-1)'). 



-"Jl 



Using the mean-reverting assumption (2.23) imphes that {An)n>o is a nonnegative process whereas 
{Bn)n>o is still J>i_i-adapted. Elementary computations show that 

E [An+l I -F„] < ^„^ - 7n+l (VA(0„) I H{0*,Yn)) - 7n+lXiYn) (VA I /l) (0„) 

where x(^n) := xO^n) — ^(x); ^ > 0. Finally we have 

Vn>0, E[Vn+l\J'n]<Vn-AWn+l-AZn+l, (2.24) 

where Wn := E'^kZllk+i {^Hh) \ H{6*,Yk)) and Z„ := Efc=o 7fc+ix(n) (VA | /i) (0^) with 7„ := 
^, n> 0. 

Second step : Following the lines of the proof of Theorem 2.1 we show that the sequence {Wn)n>o is 
L^-bounded and a.s. converges. Now our aim is to prove the same results for the sequence (Z„)„>o. 
To this end we set Sn '■= Ylk=o xO^k), then it follows 

n— 1 n— 1 

Zn = J^Tfe+lA^l (^^ I ^) i^k) = InS^ (VA I h) {en-l)-Y, ^l {lk+1 (VA | h) (Ok) - Ik (VA I h) {Ok-l)) 
k=0 k=l 

By the same methods as for the sequence {Wn)n>o (i-^- using assumptions on H, A and (7n)n>i)) we 
obtain that 



Zn ^—> Zoo and sup||Z„||-| < +oo. 



Thus we have that 



(Vn + Wn + Zn)- <{Wn + Zn)- <\Wn + Zn\eL\¥) siuCC SUp || t^„ + Z„||i < +CX). 

n>0 

As Vq = A{9o) < C{l + \eo\) G L\ it follows by induction from (2.18) that, for every n > 0, E K < +oo. 
Hence Sn '■= Vn + Wn + Zn, n > 0, is a true supermartingale with a L-'^-bounded negative part. We 
then deduce that 

On > Oqq G L . 



as as Or s 

Now Wn ——^ Woo and Z„ ^-4 Zoo imply that Vn ——^ Voo < +oo a.s. 

n— >oo n— >oo 

Third ste p: Like in the proof of Theorem 2.1, we have that Bn —^ Boo < +oo which implies that 

n— ^oo 

An ^— > Aoo < +00. Therefore using the mean-reverting property (2.23) of h with respect to VA, we 

n— >oo 

classically derive that 

J2 "fn+Mx) (VA I h) [On) < +00 a.s. (2.25) 

n>0 

The end of the proof follows the lines of the one of Theorem 2.1. D 

3 Application to quasi-stochastic approximation 

This section is devoted to quasi-random innovations: the innovation sequence {Yn)n>o becomes a 
deterministic uniformly distributed (u.d.) sequence (Cn+i)n>o over a unit hypercube [0,1]^, i.e. 

0n+l =0n- In+lHiOn, ^n+l), n > 0, ^o G M"^. 
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We extend the one-dimensional result first introduced in [24] to a general multi-dimensional setting 
with unbounded function H. We first recall few definitions and properties of u.d. sequences (see [32] 
and the reference therein). We emphasize how to apply Theorem 2.1 when H has "bounded variation" 
on [0, 1]'' and when H is Lipschitz continuous. 

3.1 Definition and characterisation 

Definition 3.1. A [0,1]'^ -valued sequence (Cn)n>i is uniformly distributed (u.d.) on [0,1]'' if 

fc=i 
The proposition below provides a characterisation of uniform distribution for a sequence (^n)n>i- 

Proposition 3.1. (a) Let {(,n)n>i be a [0, l]'^-valued sequence. Then {£,n)n>i is uniformly distributed 
on [0, 1]'' if and only if 

DniO ■= sup -y]l[o,x'](Cfc) -TT^I — ^0 asn^oo, 

where D^{^) is called the discrepancy at the origin or star discrepancy. 

(b) There exists sequences, called sequences with low discrepancy such that D^{^) = i °^ " ) • We 

refer to [32, 7] for examples of such sequences (like Halton, Kakutani, Sobol' sequences, etc). 

3.2 Standard classes Ve„,i for quasi-stochastic approximation 

We set here Yn = S,n+ii J^n = {^, ^} and AM„+i = 0, n > 0. The strong Lyapunov condition on H is 
crucial here. Note that the function cp becomes useless since we always consider the case p = 1. To 
apply Theorem 2.1, we mainly need to specify the accessible classes Ve„,i in such a framework. 

\> Function with finite variation. A function / : [0, 1]'^ — )• M has finite variation in the measure 
sense if there exists a signed measure u on ([0, l]'^,i3or([0, 1]^) such that iy{{0}) = and 

VxG[0,l]^ /(x) = /(l) + KIO,l-xl) 

where [x,y] = n?=i[^*!y*] if ^ ^ y (componentwise) and is empty otherwise and 1 = (1, . . . , 1). The 
variation V{f) of /is then defined as |i^|([0, 1]'') where v denotes the total variation measure attached 
to I'. For further details on this notion of variation, see [7]. When q = 1 this notion coincide with 
left continuous functions with finite variations. As concerns the slightly more general notion of finite 
variation in the the Hardy and Krause sense, see [32] and the references therein). The role of finite 
variation is emphasized by the following error bound. 

Proposition 3.2 (Koksma-Hlawka Inequality). Let ^ = (^i, . . . ,^„) G ([0, 1]'')" and let f be a function 
with finite variation V{f), either in the Hardy & Krause or in the measure sense. Then 






f{u)X,{du) <V{f)Dl{i). 



Hence, if (^n)n>i has a low discrepancy, the class V = {/ : [0, 1]'' — )• M s.t. V{f) < +00} of functions 
with finite variations satisfies V C Ve„,i with Sn = ^^ " . Consequently, if H{6, •) G V, the assump- 
tions on admissible (non-increasing) step sequences (7n)n>i in Theorem 2.1 reads 

^7„ = +cx), 7„(logn)'? -^ and ^max(|A7„+i| ,7^)(logn)'? < +cx). 
n>l k>l 
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so that the choice of 7^ := -^, i < p < 1, is admissible (hke in the i.i.d. setting). 

> Lipschitz continuous functions. If g > 2 it is difficult to check whether / has finite variations in 
any sense: in fact these functions become "rare" as q increases. If we look for more natural regularity 
assumption to be satisfied by H{9* , •) like Lipschitz continuity, the following theorem due to Proinov 
(see [35]) provides an alternative (but less "attractive") error bound. 

Theorem 3.1. (Proinov) Assume W^ is equipped with the l°°-norm \{x'^ , ■ ■ ■ ,x'^)\ := maxi<i<q |x*|. 
Let (^1, . . . , ^n) £ ([0, 1]'') . For every continuous function f : [0, 1]'' — )• R, 

\-T.fi^k)- fiu)X,idu) <Qu;/fe(ei,...,en)' 

where Wf{6) := sup \fi^) ~ f{y)\) ^ ^ (0)1); ^-^ the i°°-uniforni continuity modulus of f 

x,yelO,l]i,\x-y\^<S 

and Cq £ (0, 00) is a universal constant only depending on q. If q = 1, Cq = 1 and ifq>2, Cq £ [1, 4]. 

As a consequence Lip([0, 1]''',M) C Vs„,i with e^ = -^^ (with obvious extensions to Holder functions). 

Consequently, if H{9* , •) e V, the assumptions on admissible (non-increasing) step sequences (7n)n>i 
in Theorem 2.1 reads 

7n = +00, 7n(logn)n 9 -> and } ^max(|A7„+i| ,7„)(logre)ra 9 < +00. 

n>l k>l 

SO that the choice of 7„ := ^ is always admissible (more generally 7^ = cn~P, 1 — - < p < !)• An 
application of "quasi-Stochastic Approximation" is proposed in Section 5.1 (see also [15]). 

4 Applications to different types of random innovations 

This section is devoted to some first applications of the above theorem. By applications, we mean 
here printing out some classes of random innovation processes {Yn)n>o ^or which the averaging rate 
assumption (2.8) is naturally satisfied by "large" class V^^^p. 

First we present a simple framework of stochastic approximation where the noise is additive which 
is studied in [9] with some mixing properties, but here we only need (2.5). We showed in [26] how easily 
our result applies to real life stochastic optimization problem (as far as convergence is concerned). 

Afterwards we focus on mixing innovations: we consider that the sequence (Yn)n>o is a functional 
of a stationary a-mixing process (satisfying condition on the summability of the mixing coefficients). 

The last application is the case of an homogeneous Markov chain which can be seen as a possible 
more elementary counterpart of some (convergence) result obtained e.g. in [6]. Some (quasi-optimal) 
a.s. rate of convergence can be obtained if H is smooth enough in 9 (see [25]), but to establish a regular 
Central Limit Theorem it is most likely that we cannot avoid to deal with the Poisson equation. 

4.1 Recursive procedure with additive noise 

We consider here the case where the function H is the sum of the mean function h and a noise, namely 

y9 G R'^, Vy G R«, H{9, y) = h{9) + y, and AM„+i = 0. 

In this framework, the Lyapunov assumption (2.15) becomes classical involving only the mean function 
/i, namely 

V0GM'^\{r} (VL I /i) (61) > 0. 
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Likewise, the growth control assumption (2.12) amounts to 



ye G w^, \h{e)\ < ChVT+L{9), 

provided the moment assumption sup„ ||yn||^_ < +00, for some p £ (l,oo], is satisfied (take (j){y) := 
\y\ V 1). The martingale is vanishing in this example. Finally the step assumption (2.14) is ruled by 
the averaging rate of the sequence (l"n)n>o- 

4.2 Functional of a stationary a-mixing process 

Here we provide a short background on a-mixing processes and their functionals. Our motivation here 
is to relax as much as possible our assumption on {Yn)n>o in order to apply stochastic approximation 
methods to exogenous possibly non Markovian stationary data. 

We aim now at applying our convergence theorem to input sequences {Yn)n>o which are (causal) 
functionals of an a-mixing process. Consider a stationary M'^-valued process X = {Xk)kez, its natural 
filtration Tn = J'n '■— ^{^k, k < n) and Qn = Qn '■— ^{^k, k > n). The a-mixing coefficients are 
defined as follows 

a„ = sup{|P(C/ny)-P(t/)P(T/)|, C/G J-fc,yGgfc+„,A;>0}. (4.26) 

Let / be a measurable mapping from (M'')^ to M. Let (yfc)fcez be a causal functional of X, i.e. 

VnGZ, Yn:=f{...,Xn-l,Xn). 

Then iYn)n>Q is a stationary process with marginal distribution u = C{Yq). 

The proposition below show that if (X„)„gg is a-mixing "fast enough" then H{6* , •) "almost" lies 
in V _ 1 (up to logarithmic factor) as soon as ¥]H {9* ,Yq)\'^^^ < -|-oo for a J > (so is true for H{9, •) 

since we do not know 9* a priori). 

Proposition 4.1. Assume g G L^^ (i/), (5 > 0, and that one of the following assumptions holds 
(a) For all n G Z, y„ := /(. . . .,Xn-i,Xn) where X is a stationary a-mixing process satisfying 



2{2+S) 

J:^<+oo. (4.27) 

fc>i ^fc 

(6) Yn = Xn, n>Q, and X is a stationary a-mixing process satisfying the condition 

^af^<+oo. (4.28) 

fc>i 

Then 5 £ V {,,) , with e™ = (logn)2 ^ n~^, for every i] > 0. 



F^ " 2' 



(4.29) 



In particular g lies in V„-/3 2 for every /? G (0, ^). 

Remark. Condition (4.28) is satisfied when the underlying process X is geometrically a-mixing. 
Slightly refined results could be obtained by calling upon Philip and Stout's Law of Iterated Logarithm 
but the resulting claims would be significantly more technical to state for little practical benefit. 

An applicationof this result (based on real data) is briefly developed in Section 5.5. The proof of 
Proposition4.1 relies on the Gal-Koksma Theorem (see [16] and [1] for a probabilistic version). We 
state it here in a stationary framework. 
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Theorem 4.1. (Gdl-Koksma's Theorem) Let (il,J^, P) be a probability space and let (Z„)„>i be a 
sequence of random variables belonging to 1/, p > 1, satisfying 

E |Zi + Z2 + • • • + Zn\^ = Oi^iN)) 

where jy , N > 1, is a nondecreasing sequence. Then for every rj > 0, 

Zi{uj) + Z2{io) + ■■■ + Zn{uj) = o ((^(iV)(log(iV))P+i+'')p) P(ciw)-a.s. 

Remark. The conditions on X and Z come from a result established in [10]: by setting Po(-^fc) '■= 

E[Zfc| Jo]-IE[^fc|-F_i],if 

^ ||Po(^fc)|l2 < +00 then J];|Cov(Zo,Zfc)| < +oo. (4.30) 

feez fcez 

Moreover using [34], condition (4.30) is satisfied as soon as 

oo ^ 

J^^||E[Zo|gfe]||2<+oo. (4.31) 

Proof of Proposition 4.1. Let Zn = g{Yn) — J^q g{y)v{dy), n ^"L. Without loss of generality we 
may assume Jj^, g{y)v{dy) = 0. 

(a) We will rely on the above Gal-Koksma Theorem (Theorem 4.1). First, we evaluate E |Zo + • • • + Zn-i\ ■ 
Setting S^ = X])=i ^ [^j^o]^ A; G N, elementary computations lead to 

71—1 k n— 1 / „ 71—1 \ 

E |Zo + • • • + Z„_i|2 = n¥.Zl + 2 ^ ^ E [ZjZq] = n¥.Zl + 2 ^ 5f = n EZq^ + - ^ Sf . 

fc=lj=l k=l \ k=l / 

To establish that S'.f converges, we will establish (4.31). Set B2{Gk) ■= {W £ Gk ■ \\W\\2 < 1}. Then 

||E(Zo|gfc)||2= sup K{WZo)<8al\\g(Yo)\L 



owing to the classical covariance inequality for a-mixing process (see [12], Theorem 3(1), p. 9) with 
1 + 1 = 1 

r ^ p 2' 



- + i = i, r,p>2. AsgG L"^^ (ly), 6 > 0, we may set p = 2 + 5, and r = ^ ^ . As a consequence 



oo ^ 

5^^||E(Zo|gfc)||2<+oo, 



which implies (owing to (4.30)) that S^ converges. Now, by Cesaro's Lemma we have 

1 



k=l 



— \Zq ^ + Zn-l, 

n 



E |Zo + • • • + Zn^i\ = 0{n) or equivalently 
Thus, one concludes by by Gal-Koksma's Theorem since, for every r/ > 0, 



0(n"5). 



Zq + • • • + Zn-i /(logn)i+^\ 
= o ■= r-a.s. 



n \ \ n 
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(6) If we assume that Yn = X^, n > 0, we directly use the covariance inequality for a-mixing process 

|Cov(Zj,Zo)| <8a;||Zo||p||Zo||g, 
where f + ^ + ^ = 1- By symmetry, we take p = q > 2 and we get 



\E{ZjZo)\<8a^ n\Zo\\l- 

^ 2_ i5 

As 5 G L^"*" , 5 > 0, we may set p = 2 + 5 and we obtain a • ^^" = a'^'^^ . The condition (4.27) can be 
replaced by the less stringent Ibragimov's condition (4.28) to complete the proof. D 

4.3 Homogeneous Markov chain 

Assume that the innovation process {Yn)n>o is an M'^-valued homogeneous Markov chain which tran- 
sition is {P{y,dz))^^q and starting distribution /x = £(lo)- For convenience we will assume that the 
chain lives on its canonical space ((M'^) , ;Bor(M'')® ). 

4.3.1 Application of the convergence theorem 

We consider the classical Markov stochastic approximation procedure procedure 

Bn+l = en- ln+lK{en, Yn+l), n > 0, (4.32) 

where K : R'^ x W ^ W^ is a Borel function satisfying (4.33) below and Oq : {n,A,F) -> R'^ is 
independent of {Yn)n>o- Note that {Yn)n>o remains is still a Markov chain with respect to Tn = 
a{eo,Yo,...,Yn),n>0. 

Set H{9,y) := P{K{9,.)){y) and AM^+i := K{9n,Yn+i) -E[K{9n,Yn+i) \ -F„]. Then the proce- 
dure has the canonical form (1.2) with respect to the filtration {J-n)n>o- 

Remark. If we consider that the Markov chain starts from Yi, then J^n = o'{9o,Yi, . . . ,Yn) and 
E[K{9o, Yi)\To] = IE [K{9, yi)]|e=0„ = ^J'P{K{9, .))\e=eo since 9o and Yi are independent. 

Let p G [1,00) and set r = 2 V -^ G [2, -|-cxo]. We make the following growth assumption on the 
function K 

V^GM^ V2/GM^ \K{9,y)\<CKHy)V'^ + L{G) (4.33) 

where L : M'^ — )• M+ satisfies (2.9) and sup„>o ||0(Y"„)|| < +00. 

Then H satisfies (2.12) with (/)(y) = P(l){y) = '&y(l){Yi) < W^PWirtpt ^^-v-i < +co and AM„+i satisfies 
(2.13) with (/)(y) = W^lir^p^y^dz)) ^° *'^^*' Anally, we may choose (j){y) = W^l^r ^p^y^dz))' having in mind 
that II </> (1^)1 1 = ||</'(^+i)|| • Now, the proposition below straightforwardly follows from Theorem 2.1. 

Proposition 4.2. Letp G [l,oo) and 9* G M . If K satisfies (4-33) and H satisfies the strict pathwise 
Lyapunov assumption (2.15), if {'^n)n>i satisfies (2.14) for a sequence (en)n>i satisfying (2.7) and 
H{9*, •) G Vs„^p, then the recursive procedure with Markov innovations defined by (4.32) converges, i.e. 

a a.s. ^uf 
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We will say that the Markov chain {Yn)n>o (starting from 1^ ~ /i) is z^-ergodic (resp. z^-stable) under 



4.3.2 Ergodic framework description 

A^e will say that the Markov chain {Yn)n>o ( 
F/^ if for every bounded Borel (resp. continuous) function / : M"^ — )• M, 

F,-a.s. l-J2f(Y'^)r^^ [ fd'^- (4-34) 

As soon as the transition {P{y,dz))y^^q of (Yn)n>o is Feller, the above i/-stability property implies 
that 1^ is an invariant distribution of the chain, i.e. vP = v. In case of z^-ergodicity the same conclusion 
holds unconditionally. As a consequence the whole sequence {Yn)n>Q is stationary under P^,. 

Let us focus on the case fj, = u. If (4.34) holds (with fi = u), it is classical background that the 
whole chain is ergodic under Pj, (on the canonical space) for the shift operator 0, i.e. by Birkhoff's 
theorem, for every functional F : ((M9)^,Sor(M5)®^) ^ M, F G LP{F^), 



1 " " 

- V F o G'' — > EJF) ¥^-a.s. and in LP(¥^) 



fc=0 

so that by considering F((y„)„>o) = /(yo)i / £ L'p{v), we finally get that 

Vo+,p(P.) = LP{v). 

Note that if the set of invariant distributions for P (convex and) weakly compact and if v is extremal 
in it (so will be e.g. the case if v is unique!) then the chain is ergodic under Pjy so that the above 
equality still holds. Furthermore, we know by a straightforward application of Gal-Koksma Theorem 
that for any g E L'^iy) for which the related Poisson Equation g — h'(g) = ipg — Pipg has a solution 
ipg G L'^ii^), then 



¥.,\g{Yo) + ■ ■ ■ + g{Yn-i) - nuig)\^ = E,\^g{Yo) - PipgiYn-i) + Yl Vgi^k) - P^g{Y, 



k-V? 



< 



l<fc<n-2 

Qu{i/g) + 3(n - 2)v[{^g - P^gf) = 0{n) 



so that Pi V„-/3^2(IF'i^) ^ L^i^)- 

/3e(o,i) 

Now we will make a connection between the classes Ve„,p(Pi/) and Ve„,p'(Py) which will provide exam- 
ples of non-stationary (Markovian) innovations that can be "plugged" in stochastic Approximation 
procedures in the spirit of Theorem 2.1. 

Proposition 4.3. Let p G [l,-|-oo) and let p' £ {0,p]. If {Yn)n>o is f^-ergodic and P{y,dz) = 
g{y,z)v{dz), y £ W^ where g : (M'^)^ — )• ]R_|_ satisfies v{dz)-a.e., g{-,z) > 0. Then v is the unique 
invariant distribution of P and for every sequence (en)n>o satisfying (2.7), 

yy G W, g{y, •) G L^ (u) =^ V,„y(Pj,) D V,„,p(P.). 

Proof. It follows form the assumption that any invariant distribution u' is equivalent to i' which 
implies classically uniqueness. 
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O The a.s. rate. Let / G LP(i/), y G M? and Af := lu; 



n-i -\ 

: - V fiYkicj)) - fdv = 0{en) ). Since 



lim inf„ ?7-e„ > 0, if denotes the shift operator on the canonical space of the chain (y„)n>0) ^/ clearly 
satisfies Af = Q~'^{Af) i.e. l^j = l^j o ©• Therefore 

¥y{Af) = Ey{lA,) = E,(1a, o 9) = EyiFyAAf)). 

Assume now / G Ve„,p{Fu)- By assumption F^{Af) = 1. Let y £W. Then 

P^(yl^) = / u{dz)F^{Aj) = 1 so that i^{dz)-a.s. P^C^/) = 1. 

Now P{y,dz)<^u{dz) implies /jg,P(y, dz)P^(A/) = 1 i.e. Ey [Py^(A/)] = 1 or equivalently Fy{Af) = 1. 
> The LP' -rate. Letp' G (0,p], let / G V,„,p(P.) C LP{u) and (^„y(2/) := i Efc=d /(n(^)) - /k. /^^^ 



LP'iPy) 



n—1 

-V/(nM)- / /dz^ ^ , =0(^n) sothat / <(y)Kdy) = 0(e5^). (4.35) 



Assume temporarily that p' > 1. Consequently, Minkowski inequality implies 



/ X ^ \fiy)- LifM , f, 1 



n 



n-l 



n 



r f — ./TOO 



fc=i 



LP'(P«) 



n \ ny " 



n-l 



n 



J- f ./TO9 



A:=l 



-Fi 



l''ilLP'(Py) 



where we used the Markov property in the last equality. Since P{y, dz) = g(y, z)i'{dz), we derive from 
Holder's Inequality (applied to r = ^ and s = z^zj) 



p-p 



Eyipn-l{Y^] 



< 



ipn-i{zf' P{y,dz) = I (pn-iizf'g{y,z)u{dz) 

r 
p 

{y,-)l^^([ ^n-iivri^idy))" 



< \\g{y,-)\\ ^L^, 0{ePJ owing to (4.35). 



Finally 



^n{y) < - + ( 1 — 1 \\giy, -jw^l. 



0{en) = 0{en) i.e. / G V,„ 



p'K'^y) 



D 



The case p' G (0, 1) follows by the usual adjustments (pseudo-Minkowski inequality, etc). 

Comments. By contrast with the approach of [6], it is not mandatory to solve the Poisson equation 
related to the pseudo-transition 

ne„(y„,dz) = p(K„+iGdz| j-„) 
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of the algorithm. Indeed, they assume there exists a function vg := v{9, •) solution to 

Id-Ueve = H{e,-)-h{9) (4.36) 

(Assumption (-^4) in [6] p. 220). The target 0* is then a zero of the mean function h (not canonically 
defined at this stage in [6]). In our setting, 'n.g{y,dz) = P{y,dz) since the dynamics of (y„) — n > 
does not depend upon 6, so that Condition (4.36) reads 



v{9, y) - / v{d, z)P{y, dz) = H{9, y) - h{9) 

JRI 

where the mean function is naturally defined by 

h{9)= I H{e,y)u{dy) 



{v is the unique invariant probability measure for P) . Then the family of Poisson equations (indexed 
by the parameter 6) reads 

v{e,y) - Pv{e,y) = H{e,y) - h{e). 

A formal solution is given by v{9, y) = NJ P {H{6, •) — h{6)) (y), but the point is precisely to establish 

fc>0 

its existence and its properties by using the mixing properties of the semi-group P (see [6]). 

5 Applications and numerical examples 

This section is devoted to several examples (mainly inspired by in Finance) of application of conver- 
gence theorems in the different frameworks developed in Section 3 and 4. 

5.1 Application to implicit correlation search by quasi-stochastic approximation 

Consider a 2-dimensional Black-Scholes model i.e. Xq = e''^ (riskless asset) and 

Vt>0, A',^ = x^e('-^)*+"^^S x'o>0, i = l,2, 

for the two risky assets where (W^^, ^'^\ = pi^ P ^ [~1) !]• Consider a best-of call option characterized 
by its payoff 

(max(X^,X|) -K)^. 

We will use a stochastic recursive procedure to solve the inverse problem in p 

Pboc{xIxI K, ai, (72, r, p, T) = Pr"''' 
where p^°-'^'^^^ ig the quoted premium of the option (mark-to-market) with 

PBoc{xl,xlK,ai,a2,r,p,T) := e~''^E\{max{X^,X^) - K) 



^2 



e-'-^E 



'1 '72\ A 



Ly^^^-^^'\xy^^^''^^i'''^^^^'')] - k] 



max XnC 



where Pi = r — ^, i = 1,2, Z = {Z , Z ) = A/'(0, I2). We assume from now on that this equation (in 
p) has at least one solution, say p* . The most convenient way to prevent edge effects due to the fact 
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that /) G [—1)1] is to use a trigonometric parametrization of the correlation by setting p = cos 0, G M. 
This introduces an over-par ametrization since 6 and 2tt — 6 yield the same solution inside [0, 27r], but 
this is not at all a significant problem for practical implementation (a careful examination shows that 
in fact one equilibrium is repulsive and one is attractive). From now on, for convenience, we will just 
mention the dependence of the premium function in the variable 0, namely 

e ^^ P{e) := PBoc{xl,xl,K,ai,a2,r,cos{e),T). 

The function P is a 27r-periodic continuous function. Extracting the implicit correlation from the 
market amounts to solving 

P{e) = p^'^rket (^j^j^ p ^ ^Qg gy 

We need the following additional assumption 

Pn""''''''*G(minP,maxP) 

i.e. that p™'^'^^^* is not an extremal value of P. It is natural to set for every G M and every 

Z = (z1,z2)gM2 

H{e,z) = e-'-^ (max (^^.yiT+a.VTz^ ^^2^^,T+a,VTiz^cose+z^sine)-^ -k) - Pr''"'' 
and to define the recursive procedure 

On+i = 0n- 7„+i-H'(6'„, Zn+i), n > 0, where Zn+i ~ 7V(0,/2), 

and the gain parameter sequence satisfies (2.14). For every z G M^, 9 i — > H{6,z) is continuous 
and 27r-periodic which implies that the mean function h{6) := E,H{6,Zi) = P{9) — P™"'"^'^* and 
6 I — > E ^H'^{9, Zi)] are both continuous and 27r-periodic as well (hence bounded). 

The main difficulty to apply Theorem 2.1 is to find out the appropriate Lyapunov function. The 
quoted value p^"-^^^^ is not an extremum of the function P, hence L^ h^{9)d9 > where h^ := 
max(ib/i, 0). We consider ^q any (fixed) solution to the equation h{9) = and two real numbers I3± 
such that 



p27r 
' 



f''^h+{9)d9 



and we set 

.n..^i ^{h>o}iO) + f3+l{h<o}iO) if 0>9*, 
3^ >■ \ l|,>o}W + /3-%<o}W if 0<0*o. 

The function 

9^^g{9)h{9) = h+-l3±h_ 

is continuous and "positively" 27r-periodic on [0q,oo) and "negatively" 27r-periodic on (— oo,0q]. More- 
over, gh{9) = iff h{9) = so that 5/1(6*0) = gh{9Q—) = which ensures on the way the continuity of 
gh on M. Furthermore Jq gh{9)d9 > and J_2-k 9h{9)d9 < so that, on the one hand, 



.9 
lim / gh{u)du = +00 



and, on the other hand, there exists a real constant C > such that the function 

L{9) = I gh{u)du + C 
Jo 
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is nonnegative. Its derivative is given by L' = gh so that L'h = gh? > and {L'h = 0} = {/i = 0}. 
It remains to prove that L' is Lipschitz continuous. One checks by applying the usual differentiation 
theorem for functions defined by an integral that, if di 7^ <T2 or xi 7^ a;2, then P is differentiable on 
the whole real line, otherwise it is differentiable only on M \ 27rZ, and in both cases 



P'{e) = o-2\/rE (1 



^{X'^>rxia.^{X\.,K)} 



X|(cos(0)z2-sin(e)Z^) 



Furthermore, with obvious notations, as soon as P'{6) exists, 

\P\6)\ < E \X^{cos{9)Z'^ - sin(6')Z^)| . 

The right handside of the inequality defined a 27r-periodic continuous function, hence bounded on the 
real line. Consequently |-P'(^)| is bounded. It follows that the 27r-periodic functions h and h± are 
Lipschitz continuous which implies in turn that L' = gh is Lipschitz as well. 

Moreover, one can show that the equation P{9) = p^'^^^'^^ market has finitely many solutions on 
every interval of length 27r. One may apply Theorem 2.1 to derive that 6n will converge toward a 
solution 0* of the equation P{6) = Pj^"''*:^*. 

Numerical illustration. We set the model parameters to the following values 



and the payoff parameters 



x^ = 100, r = 0.10, 0-1 = CJ2 = 0.30, p 



1, K = 100. 



-0.50 



The implicit correlation search recursive procedure is implemented with a sequence of some quasi- 
random normal numbers, namely 

iC Cl) = (x/-21og(ei)sin (2<2) ^ ^_2log(^i)cos {27:^1 
where S,n = {S,\-, '^n)' "■ — 1' i^ simply a regular 2-dimensional Halton sequence (see [32] for a definition) 



) values 



Correlation p=cos(e) 



MC 
-QMC 




- p=-0.5 

■MC 

-QMC 



wi/V 



20 30 40 50 

Nb of Simulations 



70 80 
x10^ 



20 30 40 50 
Nb of Simulations 



Figure 1: B-S Best-of-Cah option. T = 1, r = 0.10, 0-1=0-2 = 0.30, xl 



100, K = 100. Left: 



convergence of On toward a 6* (up to n = 10000). Right: convergence of pn ■= cos{9n) toward -0.5. 

The Black-Scholes reference price 30.75 is used as a market price so that the target of the stochastic 
algorithm is 9* G arccos(— 0.5). The parameters of the stochastic approximation procedure are 



6*0 = 0, n 
The choice of ^o is "blind" on purpose (see Figure 1) 



10°, 7n = -, n > 1. 
n 
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5.2 Recursive omputation of the VaR and the CVaR 

Another example of application is the recursive computation of financial risk measure which are 
the best known and the most common: the Value-at-Risk (VaR) and the Conditional Value-at-Risk 
(CVaR). This risk measures evaluate the extreme losses of a portfolio potentially faced by traders. The 
recursive computation of the VaR and the CVaR was introduced in [5] , based on the formulation as an 
optimization problem (see [37]) and on an unconstrained importance sampling procedure developed 
in [29]. These variance reduction aspects are not investigated here. 

5.2.1 Definitions and formulation 

Let Y : {i^,A,¥) — )• M be a random variable representative of a loss (y > is a loss equal to Y). 

Definition 5.1. The Value at Risk (at confidence level a € (0,1), a ~ Ij of a given portfolio is the 
(lowest) a-quantile of the distribution Y i.e. 

VaRa{Y) := inf {0 \ F{Y < 0) > a} . 

As soon as the distribution function of y has no atom, the value at risk satisfies P(Y < VaRa{Y)) = 
a and if the distribution function Fy of Y is also increasing (strictly) then, it is the unique solution. 
As this risk measure is not consistent (see [14]), another consistent risk measure is provided by the 
Conditional value at Risk when Y G L^(P) with a continuous distribution (no atom). 

Definition 5.2. Let Y E L^{F) with an atomless distribution. The Conditional value at Risk (at level 
a) is the conditional expectation of the portfolio loss Y beyond VaRo,{Y), i.e. 

CVaRa{Y) := E [y|y > VaRaiY)] . 

The following formulation of the VaRa{Y) and CVaRa{Y) as solutions to an optimization problem 
is due to Rockafellar and Uryasev in [37]. 

Proposition 5.1. (Rockafellar and Uryasev) Let Y G L^(P) with an atomless distribution. The 
function V : 9 f-^ 9 + i:r^^ 0^ ~ ^)+ ^-^ convex, and 

CVaRJY) = min {9 + ^— E {Y - 9),] with VaRJY) = inf argmin [0 + —^E(Y - 9). 
e \ 1 — a ^ ) e \ \ — a 

h.1.1 Stochastic gradient for the computation of both VaRa^Y) and CVaRa^Y) 

[> Computation of the V aRa(Y\ What precedes suggests to implement a stochastic gradient descent 
derived from the above convex objective function V{0) = ^ + T^E(y — 0^j^. Assume that Y € L^(P) 
with a continuous increasing distribution function Fy (for the sake of simplicity, see [5] for a slightly 
more general framework). Let v = C{Y). We check that 

V(9) y{-^) a 

lim = 1 and lim = hence lim V(0) = +oo. 

e^+oo 9 e->+oo 9 1 — a 9^±oo 

so that {VaRo,{Y)} = argminjjF. We check that V'{9) = E[H{9,Y)] where 

Note that H is uniformly bounded by 1 V j^- This leads to devise the stochastic gradient descent 

e„+l =9n- ^n+lH{0n,Yn), n > 0, 00 e L\F). 
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whose unique target is 6* = VaRa{Y). It is clear that, for every y G M, i— >■ H{6,y) is nondecreasing 
so that L{9) = ^{9 — 9*)"^ is a good candidate as a Lyapunov function. In fact it is even a strict 
pathwise Lyapunov function in the sense of (2.15) by setting for every 6 > 0, "^5(9) = 51|5i_5i.|>5 and 
Xsiy) = \~e*\<5- 

As soon as {Yn)>o is z^-averaging, there exists a sequence (en)n>i such that, for every ^ G M, 
l{j/>6»} £ ^£n,2 since the empirical distribution measure a.s. (and subsequently in L^) converges uni- 
formly toward Fy. Finally, as soon as the step sequence (7n)n>i is admissible for (£«)„>!, Theorem 
2.1 implies that 

VaRa{Y). 






In practice 7„ = cjn^ c > 0, is always admissible given the rate of convergence of the empirical 
measure in usual applications. Of course, when the Yn are i.i.d., standard martingale arguments "a la 
Robbins-Monro" make things straightforward under less stringent assumptions on the step sequence. 

[> Computation of the CVaRa(Y). The idea to compute the CVaRa{Y) is to devise a companion 
procedure of the above stochastic gradient by setting, Co = and for every n > 0, 

Cn+l = Cn - -^ iCn " v{9n, Y^)) with v{9, y) := 9 + ^^ ~ ^^ + 



n + 1 1 — a 

One checks that for every n > 0, 

-. n—l ^ n—1 ^ n—1 

Cn = -y^v{9k,Yk) = -y^v{9*,Yk) + -y^v{9k,Yk) - v{9* ,Yk) 
fc=o fc=o fc=o 

Using that v is Lipschitz continuous in 9 uniformly in y, we derive that the second term in the right 
hand side of the above equality goes to a.s. as 0„ — >• a.s. 

As concerns the first term, still in right hand side, first note that v{9* ,y) has a linear growth in y so 
it will a.s. go to Ev{9*,Y) = V{9*) = CVaRa{Y) as soon as, e.g., supn>-i^^Ylk=o I'^fcl^^'' < +»= a-?- 
for an r/ > by combining standard uniform integrability arguments (with respect to the empirical 
measure) and the z^-stability of (l^)n>o- In practice one must keep in mind that an adaptive importance 
sampling procedure like that detailed in [5] should be added. For a QMC implementation of the 
procedure, see [15]. 

5.3 Long term investment evaluation (and inhomogeneous Markov innovations) 

In this example we deal with averaging inhomogeneous Markov innovations, namely the Euler scheme 
with decreasing step of a Brownian diffusion. To describe the functional class Ve„,p, we rely on an 
approach developed in [20] and [27] to compute the invariant measure of a diffusion. 

5.3.1 Computation of the invariant distribution of a diffusion 

We consider a stochastic recursive algorithm for the computation of the invariant distribution v in- 
troduced in [20] of a Brownian diffusion process 

dYt = b{Yt)dt + a(Yt)dWt (5.37) 

where & : R'' — )• M'^ and o" : M"^ — )• A4q^i{M) (matrices with q rows and i columns) are Lipschitz 
continuous, and VF is a ^-dimensional Brownian motion. We denote by A its infinitesimal generator 
and by {Pt)t>o its transition semi-group. 
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First, we introduce the Euler discretization of (5.37) with a step 7„ vanishing to 0, i.e. 

Vn E N, Yn+l = Yn+ ln+ih{Yn) + ./%:^i<7{%)Un+i, (5.38) 

where Yq G L{^g(0, J^, P) and {Un)n>i is M -valued normahzed white noise defined on a probabihty 
space (fi, J^, P), independent of Yq. The step sequence 7 := {^n)n>i satisfies the conditions 

n 

Mn > 1, 7n > 0, hm 7n = and f „ := > 7^ — 5- +00. (5.39) 

fc=l 

For every n > 1 and every cj G 0, set 

^ n— 1 

z/n(w, dy) ■= -^ ^Yk(uj)- (5-40) 

A:=0 

We will use fn('^, /) which can be compute recursively to approximate v{f). 

Definition 5.3. (Strong condition of stability) A diffusion with generator A satisfies a strong stability 
condition of type (F, a) if there exists a (so-called Lyapunov) function V G C'^{W^, [1, +oo[) such that 
lim|y|_>_j_oo V{y) = +00 and 3a>0, 3/3>0 such that AV < —aV + /?. 

Remark. If the {V, a)-strong stability condition holds then (5.37) admits a strong solution starting 
from any y £M. and admits at least one invariant distribution v [i.e. vPt = u, t > 0). 

Definition 5.4. (a) A couple (7,?/) is an averaging step-weight system if the sequences {'jn)n>i o,nd 
(^n)n>i fl'^e nonnegative, general terms of a non- converging series and such that 



iim7„ = o, Y.ir\^^) ^+°° "''''^ ^ 

n,>l " ^ ^"^+ n>l 



, 2 

< +00, 



Hny/y^ 



where F„ = Y^k=i'nk- 

(b) In particular, if r]n = 1, then (7,1) is an averaging step-weight system if 

lim7n = 0, > — \ < +00 ^'^w / r,_ < +(X). 

5:1 "VTn 7n+iy 5:1 "7n 



n>l ^ ' i / „>]^ 



The terminology "averaging" refers here to the fact that if A is {V, a)-stable (and the invariant 
distribution v is unique for the sake of simplicity) then, as soon as (7,?/) is averaging (see e.g. [20], 
[27] or [28]), then 



n-l 



supEy(y„) < +00 and F{duj)-a.s. i^n{^,dy) := —-'^i^kdy^^^^^ 

— fc=0 



u. 







Example. If 7„ = ^, < r < 1, and rjn = 1, then (7, 1) is averaging. 

We assume that the diffusion {Yt)t>o satisfies a strong condition of stability of type {V, a) with 
V sub-quadratic and that the invariant measure ly is unique. Besides the coefficients b and a satisfy 
|6p + Tr((T(7*) = 0{V). Then the Euler scheme with decreasing step (?n)n>o defined by (5.38) satisfies 
a strong condition of stability of type {W, uq) where VF is a function depending upon V and the 
moments of Ui, namely 

Vn>no, E[W{Yn+i)\a{Yk,0<k<n)] < {I - ajn+i)W{Yn) + p. 

> If Ui is sub-normal (typically if Ui is normal), and Tr((7(T*) < Ca-V^^'^, we may choose W = 
exp(Ay^) with A small enough (see [27] Proposition III. 2 p. 36). 
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> If Ui has a moment of order 2{p + 1), p > 2, then W = V'^^^ (idem). 

Assume that the function / : M'' — )• R admits a regular enough solution (j) to the Poisson equation 

Acl> = -{f-uif)), (5.41) 

i.e. belonging to the set 



£. 



p,W 



C^CM^M), Vj G {0, . . . ,p}, Vy G M", \D^ cp{y)\^ = o (^) } 



and satisfying D^cf) Lipschitz. For such functions 0, let us to define the functions D„, 3 < (7 < p, by 



Vy G M^ D,{y) = ^ ^Z?^0(y) • ((6(y))®('^-^) ,E 



(^(y)^^)«(2.-.) 



They will appear in the development of the error of order p, p > 3. 

Theorem 5.1. Let p > 2 such that Ui G L^^^^-^' and 4> G Sp^^/ solution to Poisson equation (5.41) 
such that DP(p is Lipschitz. Define q* by 

q* = min {D„ / 0} A (p + 1). 
<?e{3,...,p} 

(Note that if Ui ~ A/'(0, Jg) then q* = A). Let f\?'^ = Y2=i ll, /3 G M. Assume that the couples (7, 1) 
and (7, i) are averaging and that (7n)n>i is non-increasing. If q* < p and 



f-(9*/2-l) 
J- n 



r, 



/l\ n— !>+oo 



eG]0,+oo] 



-1 



^n 7n ) ) '^s non-mcreasmg, 

n>l 



J2 ^(n*/2-l) ^^ < +°° «^^ E 



n>l ■'-ri 



">1 7n I in 



< +00, 



then 



f G Ve„ 9 TO^/i Sn 



f('?*/2-l) 



0. 



n n— 5>+oo 

Corollary 5.1. // 7„ = ^, 70 > 0, the above theorem holds true when < r < ^rrj cLnd e„ 
n~'^yi /2~i)_ jjj particular, for a Gaussian Euler scheme, En = n~^ . 

Sketch of proof of Theorem 5.1. Proposition V.4 in [27] gives 

n-l 

= ||M„ + 5„||2 + o(l) 



r^ 



" i)(^E/(^^-)-K/) 



(gV2-l) 



n 



fc=o 



where M„ = ,,,,. J] W- (V0(yfc_i) | ^(^-1)^^) and 5„ = j^ -(,./2-i) E%'"'-P'/(^fc-i) 



^(g*/2-l) -^ V 7A- 
in fc=i * "<^ 



q=q 



< p{gV2-l) 

* ■■-" fc=l 



Using that (/> G Epyv-, i-^- that for every q, \Dg\ = o{W), and that sup„EVF(y„) < +00 (according 
to the stability condition of the Euler scheme), we get (see the remark after Proposition V.l p. 62 in 

[27]), 



T]? ' ' 0<k<n-l ^ 



=-(-1) /f.(g*/2-i) 



since \/Tl^ '> /Tl? '^ '> — > ^"^ g]0,+oo[ and ||sup5„||2 < +00. 

n— >+oo jj 



D 
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5.3.2 Application to the minimization of a potential 

The aim is to minimize a convex potential V : M'' — )• M having a minimum (e.g. because lim|g|_j._|_oo V{6) = 
+00) assumed to be unique. We also assume that V has a representation as the expectation with re- 
spect to the invariant distribution i' of an ergodic diffusion, say Y defined above. Typically V appears 
as the long run limit (under appropriate assumptions) of a functional through Birkhoff 's Theorem: 

FW= lim 1 / v{9,Yt+s)ds = EJ^ [ v{9,Ysds) = [ v{e,y)u{dy). 

We make the following assumptions 

(i) Integr ability: Vy G M'', 6 1— t- v{9,y) is convex. 

(ii) Pathwise convexity: V6' G R'^, v{9,-) G -L^^)- 
(Hi) Differentiability: V^ G M"^, Vev{9,y) exists. 

(iv) Uniform integrability: V0G W^, f \o~^oi\ '^ I , r?^ >0, is uniformly integrable. 

V l^-^l J e'&[e-r,o,e+r,B]\{e} 

Then (using uniqueness oi 9*), 

6** = argmiuggjjd / v{9,y)v{dy) iff / Vev{9* ,y)i'{dy) = 0, 

JRI JRi 

At this stage the idea is to devise a stochastic gradient (gradient based recursive zero search) using 
the Gaussian Euler scheme {Yn)n>o with decreasing step 7,1 = 7ora~3, 70 > 0, of 1" as an i/-averaging 
innovation process with rate e„ = r„/n — )• 0: 

Vn > 0, 9n+l =9n- -fn+lVev{9n,Yn), ^0 G K^- 

Let pG [1, do) such that Vqv satisfies the growth assumption (2.12) with L{9) = \9 — 0*^, \/qv{9* , ■) G 
Ve„,p and (7n)n>i is admissible for e„ given by Corollary 5.1, then Theorem 2.1 implies that 9n — ?• 
9* a.s. 

Toy numerical example. We consider a long-term investment project (see the example in [30]) 
which yields payoff at a rate that depends on the installed capacity level and on the value of an under- 
lying state process modeled with an ergodic diffusion. The process Y represents an economic indicator 
such as the asset demand or its discounted price. Our aim is to determine the capacity expansion 
strategy that maximizes the long-term average payoff resulting from the project operation. So it is 
an ergodic control problem in a microeconomic framework. In [30] is shown that this dynamical opti- 
mization problem is equivalent (see above) to a static optimization problem involving the stationary 
distribution v oiY and the (concave) running payoff function C, namely, still following [30], 

V0GM+, VyGM+, C{9,y) = y''9^ - c9 where a,/3G (0,1) and cG(0,oo). 

The term u"0^ can be identified to the so-called Cobb-Douqlas production function, while the term c9 
„.ea..,s the COS. of capita, use. Ot. .^. is to .»„„i» / (-I)(.".« - c.)K..) (so that of coutse 

Jm.1 

9* = ( — ^) 1-^!). Since Ve C{9,y) is singular at 9 = 0, we will introduce the increasing convex with 

linear growth change of variable 9 = [9 + {9"^ + 1)V2)/'^ ' ^ p(Q^ = _i ][-^^ _l_ 1^^^, from R onto (0, oo) 
and we consider 

Vsv{9,y) = -VeC{{9 + {9^ + lf'^Y^'\y), ^~gM, yGM+. 
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Still following [30], the dynamics of the underlying state process Y is modeled by the one-dimensional 
CIR diffusion (whose diffusion coefficient is unfortunately not Lipschitz), namely 



dYt = Ki^-Yt)dt + a^/\Y^\dWt, Yq > 0, 

where k, i?, a > are constants satisfying 2k?? > a^ so that {Yt)t>Q is (0, oo)-valued. 
The resulting stochastic gradient procedure with step (7n)n>i reads 



(5.42) 



Vn > 0, On+i = en- 7„+iVe?;(e„, ?„), ^o G K, 

where (7n)n>i is admissible with respect to En = f „/n and iYn)n>Q the Euler scheme with step 7„ = 
'^/■^ {L{9) = \9 — 0*p is still a pathwise Lyapunov function). One checks that V^v satisfies (2.12) 



7on 



with (/)(?/) = Co^^y", Ca^i3 > and p = 2 since sup„El^ < +00 and aG (0, 1). 
The invariant distribution of y is a Gamma law which density is given by 



i^{dy) 



1 



r(^) 



2«^_i / 2k 
y^ exp ( ^ 



^log(^ 



y 



1 



{y>0}, 



where T is the gamma function. Thus we can compute the previous integral, namely 

,-2\ " 



/ 



y^u{dy) 






k^ <+°°' 



pv 



'2K-d 



+ a 



2\ "\ 1-/3 




SO we have in fact a closed form for 9* given hy 9* — . ,n a-. 

\ cr(^) 

the convergence of the algorithm (the parameters are specified in the caption) . 

Optimal Strategy 



. Figure 2 illustrates 




Figure 2: Convergence towards the optimal capacity level of the investment project: k = 1, i9 = 1, 
a = 1.5, a = 0.8, /3 = 0.7, c = 0.5, n = 10^ 7„ 
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If one considers a basket of assets modeled by a Wishart process (see [8] and [17]), a similar 
long-term ergodic control process can be devised. Closed forms are no longer available for the static 
optimization problem. However, our numerical approach can be extended straightforwardly (provided 
one has at hand an efficient method of simulation for Wishart process, like that proposed in [17]). 
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5.4 The ergodic two-armed bandit 

An application of the multiplicative setting is the so-called two-armed bandit algorithm introduced in 
mathematical psychology, learning automata (see [33, 31]) and more recently asset allocation ([23]). 
Criteria on a.s. convergence under pure i.i.d. assumptions were obtained in [23, 21] and under ergodic 
assumptions in [38]. A penalized version of this algorithm is also studied in [22]. 

This algorithm is defined as follows: at each step n > 0, one plays arm A (resp. arm B) at random 
with probability 6n (resp. 1 — 6n), where 6q = 6 & (0,1) and On is updated according the following 
^^rewarding" rule: for every n > 0, 

On+l = 0n + 7n+l (^(1 " 6'n)l{(7„+i<e„}nA„+i " ^nl{(7„+i>e„}nB„+i j (5-43) 

where {Un)n>i is an i.i.d. sequence of uniform random variables, independent of {An)n>i and (-Bn)n>i 
which are two sequences of (possibly dependent) events evaluating the performances of the arms A 
and B respectively {An is the event "^'s performance is satisfactory ay time n" idem for Bn and B). 
This stochastic procedure can be rewritten in a canonical form as follows 

On+i = en + 7n+i (1a„+i " 1b„+J h{e„) + 7„+iAM„+i, Oq = 9 e (0, 1) (5.44) 

where h{e) = 9(1 - 6), M„ := Y^^^^ mk, Mq := 0, with 

rrik := 1a, (1 - ^fc-i) {^{u^+,<e,.} - Ok-i) + Is.^fc-i ((1 - ^fc-i) - l{c/„+i>e„}) • 
We make the assumption that A outperforms B in average i.e. that z^(vl) > i^{B) where 

-El^" ^ KA) and -^1^„ ^ u{B) 

fc=l fc=l 

and that these convergences hold at rate e„ satisfying (2.7). Then applying Theorem 2.2 with Y^ := 
l/lfe+i — Ifife+ii ^ > 0, and x[y) = u{A)-u(B) ' ^^ Set a first convergence result: as soon as (7n)n>i is 
admissible in the sense of (2.14) for the sequence (en)n>i, 



'n ' 



r G {0,1} 



where 1 is the target and is a trap. Further investigations on Q* are carried in [38] in this ergodic 
framework to analyze the fallibility of the algorithm which extend former results established in [23, 21] 
in the purely i.i.d. setting. 

5.5 Optimal split of orders across liquidity pools 

This is an example of application in Finance to be implemented exclusively on real data. It is an 
optimal allocation problem which solved by a stochastic Lagrangian approach originally developed in 
[26]. Here, only numerical results with real market data are presented. 

5.5.1 Model description 

The principle of a Dark pool is to propose a price with no guarantee of executed quantity at the 
occasion of an OTC transaction. Usually this price is lower than the one offered on the regular 
market. So one can model the impact of the existence of N dark pools {N > 2) on a given transaction 
as follows: let F > be the random volume to be executed, let 6i G (0, 1) be the discount factor 
proposed by the dark pool i. Let r^ denote the percentage of V sent to the dark pool i for execution. 
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Let Di > he the quantity of securities that can be delivered (or made available) by the dark pool i 
at price 6iS. 

The remainder of the order is to be executed on the regular market, at price S. Then the cost C 
of the whole executed order is given by 

AT TV TV 

C = 5 J^ 0, min {uV, Di) + s(v -Y, mm {uV, Di)^ = S (v - Y, Pi min {uV, R 

i=l i=l 1=1 

where pi = 1 — 6i G (0,1), i = 1,. . . ,N. Minimizing the mean execution cost, given the price S, 
amounts to solving the following maximization problem 



N 
max 

i=l 



iv 

{ J]/9iE(5min (r,y. A)), r-GP^} (5.45) 



where Vn := \r = (rj)i<j<7v G IR+ | Sj=i ^i = 1 r- It is then convenient to include the price S into 

both random variables V and Di by considering V := V S and Di := DiS instead of V and Di. 
Let Xat = {!,... ,N}. We set for all r = (ri,. . . ,rN) G Vn, ^{n,. ■ ■ ,rN) ■= Yli=i fiiri) , where 

V-i G Xtv, '^i{u) := PjE (min {uV, Di)) , n G [0, 1] . 

We assume that for all i G In, 

V > P-a.s., IP(A > 0) > Oand the distribution function of -j^ is continuous on M+, (5.46) 

then (fi, i & I^, are everywhere differentiable on the unit interval [0, 1] with 

ip'iiu) = p,E {l{^v<D,}y) , w G (0, 1] , (5.47) 

and one extends <pi, i £ Xjv, on the whole real line into a concave nondecreasing function with 
lim-too Vi = ±oo. So we can formally extend <l> on the whole affine hyperplane spanned by Vn i-S- 
^TV := {r = (n, . . . , rjv) G M^ | J2^^, r, = l}. 

5.5.2 Design of the recursive procedure 

We aim at solving the following maximization problem max^gp^ $(r). The Lagrangian associated to 
the sole affine constraint suggests that any r* G argmaxp^ $ iff (Pi{r*) is constant when i runs over 
In or equivalently if v3-(r*) = ^ Y.f=i ^'ji^j)^ i G Xat. 

We set y" := {V^, D'^, . . . , i^jv)n>i- Then using the representation of the derivatives cp'i yields 

r* G argrnaxcK ^^ Vi G {1, . . . ,iV} , E(y (pa{,.y<B4 " JjT.Pj'^{r;v<D,})) = 0- 



Consequently, this leads to the following recursive zero search procedure 

r] 



"'"'=^^"+7n+ii^.(^",^"+'), t'^gVn, n>0, iGX^v, (5.48) 



where for every i G In, f G Vn, every y > and every L)i, . . . , Djv > 0, 

1 ^ 



jY ^:1^ rj-xrjV<Dj} 



30 



where (i^^jnM is a sequence of random vectors with nonnegative components such that, for every 

n>l,{V^,D^l,i = l,...,N)^{y,D,,i = l,...,N). 

The underlying idea of the algorithm is to reward the dark pools which outperform the mean of 
the N dark pools by increasing the allocated volume sent at the next step (and conversely). For sake 
of simplicity that argmaxp $ = {r*} C mi{V^). Our "light" z^-averaging assumption is to assume 
that there exists an exponent rj G (0,1] such that for every u G M+ and every i G X^v 



1 '^ 

-YvH 



E(yi, ^D^ J = Oin-"^) a.s. and in L'^{F) (5.49) 



fc=i 



{.<£|} -^-{n<^} 



(which hold under geometric a-mixing assumptions on (Z?", l/")„>i). Under additional technical 
assumptions on the support of C(Y'^) (see [26]), we can apply Theorem 2.1: if the sequence {jn)n>i 
satisfies (2.14), we get that the algorithm defined by (5.48) a.s. converges towards r* = argmax-p $. 

5.5.3 Numerical Tests 

We consider the shortage setting, i.e. KV > Yli=i^^i because it is the most interesting case and 
the most common in the market. Now, we introduce an index to measure the performances of our 
recursive allocation procedure. 

[> Relative cost reduction (w.r.t. the regular market): it is defined as the ratios between 
the cost reduction of the execution using dark pools and the cost resulting from an execution on the 
regular market, i.e., for every n > 1, 



yn yn 

We have considered for V the traded volumes of a very liquid security - namely the asset BNP - 
during an 11 day period. Then we selected the N most correlated assets (in terms of traded volumes) 
with the original asset. These assets are denoted Si, i = 1,...,N and we considered their traded 
volumes during the same 11 day period. Finally, the available volumes of each dark pool i have been 
modeled as follows using the mixing function 

yi<i<N, Di := ^i((l - ai)V + aiS~ 

where ai, i = 1, . . . ,N are the recombining coefficients, /3j, i = 1, . . . , A^ some scaling factors and E V 
and ES'j stand for the empirical mean of the data sets of V and Si. The simulations presented here 
have been made with four dark pools (A'" = 4). Since the data used here cover 11 days, it is clear that, 
unlike the simulated data, these pseudo-real data are not stationary: in particular they are subject to 
daily changes of trend and volatility (at least). To highlight the resulting changes in the response of 
the algorithms, we have specified the days by drawing vertical doted lines. The dark pool pseudo-data 
parameters are set to j3 = (0.1, 0.2, 0.3, 0.2)*, a = (0.4, 0.6, 0.8, 0.2)* and the dark pool trading (rebate) 
parameters are set to p = (0.0,0.02,0.04,0.06)*. 

We benchmarked - see Figure 3 - the algorithm on the whole data set (11 days) as though it 
were stationary. In particular, the running means of the performances are computed from the very 
beginning for the first 1500 data, and then by a moving average computed on a window of 1500 data. 
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Relative Cost Reductions 




/^aAia/T^^^ 



7 8 



Figure 3: Case N = 4, YlfLi (^i < 1, 0.2 < a^ < 0.8 and rf = 1/N, l<i<N. 
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